Mexico City
Neither Valid nor Reliable Investigating the Use of LLMs as Judges
Evaluating natural language generation (NLG) systems remains a core challenge of natural language processing (NLP), further complicated by the rise of large language models (LLMs) that aim to be general-purpose. Recently, large language models as judges (LLJs) have emerged as a promising alternative to traditional metrics, but their validity remains underexplored. This position paper argues that the current enthusiasm around LLJs may be premature, as their adoption has outpaced rigorous scrutiny of their reliability and validity as evaluators. Drawing on measurement theory from the social sciences, we identify and critically assess four core assumptions underlying the use of LLJs: their ability to act as proxies for human judgment, their capabilities as evaluators, their scalability, and their cost-effectiveness. We examine how each of these assumptions may be challenged by the inherent limitations of LLMs, LLJs, or current practices in NLG evaluation. To ground our analysis, we explore three applications of LLJs: text summarization, data annotation, and safety alignment. Finally, we highlight the need for more responsible evaluation practices in LLJs evaluation, to ensure that their growing role in the field supports, rather than undermines, progress in NLG.
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Evaluating large language models (LLMs) is crucial for both assessing their capabilities and identifying safety or robustness issues prior to deployment. Reliably measuring abstract and complex phenomena such as'safety' and'robustness' requires strong construct validity, that is, having measures that represent what matters to the phenomenon. With a team of 29 expert reviewers, we conduct a systematic review of 445 LLM benchmarks from leading conferences in natural language processing and machine learning. Across the reviewed articles, we find patterns related to the measured phenomena, tasks, and scoring metrics which undermine the validity of the resulting claims. To address these shortcomings, we provide eight key recommendations and detailed actionable guidance to researchers and practitioners in developing LLM benchmarks.
How Mexican World Cup Stadiums Achieved FIFA's Environmental Certifications
Venues hosting the 2026 World Cup must meet high standards to obtain environmental certifications, but FIFA also requires that they use natural grass, which is water-intensive to maintain. Estadio Banorte, formerly called Azteca stadium, in Mexico City. Because of their scale, soccer stadiums require a fair amount of energy and water. In that time, they also generate large volumes of waste, mainly plastics and food trash. For the 2026 World Cup, the first to be held in three countries in 16 different stadiums, FIFA maintained the requirement that the venues must have LEED environmental certifications, which measure performance in water, energy, and waste management.
Diana Flores
Follow this author to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Flag football will step firmly onto the global stage at the 2028 Olympics in Los Angeles, and Diana Flores is one of the key figures who helped it get there. The Mexico City native began playing at age 8 and made her country's national team by 16.
I own 20 axolotls - people need to know they're not easy to look after
I own 20 axolotls - people need to know they're not easy to look after When Emma Honeyfield's daughter Amber asked for an axolotl for her birthday, Emma never imagined it would lead to a collection of 20. The 37-year-old bought her daughter's first axolotl, Stitch, in September and has since fallen in love with their calming nature. Emma said Amber, eight, had always been difficult to buy for, so when she asked for one for her birthday, she couldn't say no. And the family, from Tredegar, Blaenau Gwent, are far from alone in seeking out the amphibians, which are critically endangered and only found in lakes and wetlands in southern Mexico City . The animal's cute, smiling face and appearance in the hugely popular Minecraft and Roblox games has seen an increase in the number of people keeping them as pets.
A Bayesian Perspective on the Role of Epistemic Uncertainty for Delayed Generalization in In-Context Learning
Qchohi, Abdessamed, Rossi, Simone
In-context learning enables transformers to adapt to new tasks from a few examples at inference time, while grokking highlights that this generalization can emerge abruptly only after prolonged training. We study task generalization and grokking in in-context learning using a Bayesian perspective, asking what enables the delayed transition from memorization to generalization. Concretely, we consider modular arithmetic tasks in which a transformer must infer a latent linear function solely from in-context examples and analyze how predictive uncertainty evolves during training. We combine approximate Bayesian techniques to estimate the posterior distribution and we study how uncertainty behaves across training and under changes in task diversity, context length, and context noise. We find that epistemic uncertainty collapses sharply when the model groks, making uncertainty a practical label-free diagnostic of generalization in transformers. Additionally, we provide theoretical support with a simplified Bayesian linear model, showing that asymptotically both delayed generalization and uncertainty peaks arise from the same underlying spectral mechanism, which links grokking time to uncertainty dynamics.
Mexico City's 'Xoli' Chatbot Will Help World Cup Tourists Navigate the City
The launch of "Xoli" adds to the technological efforts promoted by the federal government to turn the 2026 World Cup into an engine of development for the entire country. Xoli, the new chatbot, is named after the axolotl, a salamander with external gills. The Government of Mexico City has launched Xoli, a chatbot that will provide information on services, tourism, and cultural offerings. The platform was designed to meet the demand of the millions of visitors expected to arrive during the 2026 FIFA World Cup . However, the authorities assure that the tool will remain active once the sporting event is over, with the aim of promoting economic activities and facilitating access to public services in the capital.
Scientists stunned as 500-year-old 'miracle' image of Virgin Mary reveals impossible microscopic reflection
Kentucky mother and daughter turn down $26.5MILLION to sell their farms to secretive tech giant that wants to build data center there Horrifying next twist in the Alexander brothers case: MAUREEN CALLAHAN exposes an unthinkable perversion that's been hiding in plain sight Hollywood icon who starred in Psycho after Hitchcock dubbed her'my new Grace Kelly' looks incredible at 95 Kylie Jenner's total humiliation in Hollywood: Derogatory rumor leaves her boyfriend's peers'laughing at her' behind her back Tucker Carlson erupts at Trump adviser as she hurls'SLANDER' claim linking him to synagogue shooting Ben Affleck'scores $600m deal' with Netflix to sell his AI film start-up Long hair over 45 is ageing and try-hard. I've finally cut mine off. Alexander brothers' alleged HIGH SCHOOL rape video: Classmates speak out on sickening footage... as creepy unseen photos are exposed Heartbreaking video shows very elderly DoorDash driver shuffle down customer's driveway with coffee order because he is too poor to retire Amber Valletta, 52, was a '90s Vogue model who made movies with Sandra Bullock and Kate Hudson, see her now Model Cindy Crawford, 60, mocked for her'out of touch' morning routine: 'Nothing about this is normal' Scientists stunned as 500-year-old'miracle' image of Virgin Mary reveals impossible microscopic reflection READ MORE: 'Miracle' declared at Las Vegas church as'holy face' appears during holiday mass A mysterious detail hidden inside one of the world's most famous religious images may defy conventional explanation, proving it might just be a miracle. Scientists analyzing the Tilma of Guadalupe, a cactus-fiber cloak that Christians believe bears a miraculous image of the Virgin Mary, claimed they discovered at least 13 tiny human figures embedded within the eye. The reflections are so small they can only be seen through digital enlargement, yet researchers said they resemble witnesses present when the artifact was first revealed in the 16th century.